Unsupervised clustering identifies sub-phenotypes and reveals novel outcome predictors in patients with dialysis-requiring sepsis-associated acute kidney injury

Abstract Introduction Heterogeneity exists in sepsis-associated acute kidney injury (SA-AKI). This study aimed to perform unsupervised consensus clustering in critically ill patients with dialysis-requiring SA-AKI. Patients and Methods This prospective observational cohort study included all septic patients, defined by the Sepsis-3 criteria, with dialysis-requiring SA-AKI in surgical intensive care units in Taiwan between 2009 and 2018. We employed unsupervised consensus clustering based on 23 clinical variables upon initializing renal replacement therapy. Multivariate-adjusted Cox regression models and Fine–Gray sub-distribution hazard models were built to test associations between cluster memberships with mortality and being free of dialysis at 90 days after hospital discharge, respectively. Results Consensus clustering among 999 enrolled patients identified three sub-phenotypes characterized with distinct clinical manifestations upon renal replacement therapy initiation (n = 352, 396 and 251 in cluster 1, 2 and 3, respectively). They were followed for a median of 48 (interquartile range 9.5–128.5) days. Phenotypic cluster 1, featured by younger age, lower Charlson Comorbidity Index, higher baseline estimated glomerular filtration rate but with higher severity of acute illness was associated with an increased risk of death (adjusted hazard ratio of 3.05 [95% CI, 2.35–3.97]) and less probability to become free of dialysis (adjusted sub-distribution hazard ratio of 0.55 [95% CI, 0.38–0.8]) than cluster 3. By examining distinct features of the sub-phenotypes, we discovered that pre-dialysis hyperlactatemia ≥3.3 mmol/L was an independent outcome predictor. A clinical model developed to determine high-risk sub-phenotype 1 in this cohort (C-static 0.99) can identify a sub-phenotype with high in-hospital mortality risk (adjusted hazard ratio of 1.48 [95% CI, 1.25–1.74]) in another independent multi-centre SA-AKI cohort. Conclusions Our data-driven approach suggests sub-phenotypes with clinical relevance in dialysis-requiring SA-AKI and serves an outcome predictor. This strategy represents further development toward precision medicine in the definition of high-risk sub-phenotype in patients with SA-AKI. Key messages Unsupervised consensus clustering can identify sub-phenotypes of patients with SA-AKI and provide a risk prediction. Examining the features of patient heterogeneity contributes to the discovery of serum lactate levels ≥ 3.3 mmol/L upon initializing RRT as an independent outcome predictor. This data-driven approach can be useful for prognostication and lead to a better understanding of therapeutic strategies in heterogeneous clinical syndromes.

Introduction underwent dialysis during SA-AKI had a much higher risk of mortality [2] and lower possibility of kidney recovery [6]. Recent studies reported that 90-day mortality in patients with dialysis-requiring SA-AKI ranged from 45.2% to 81.2% [7][8][9][10]. Therefore, this population represents the highest risk of adverse clinical outcomes.
Both sepsis and AKI are highly heterogenous syndromes [11,12]. It is not surprising that several reports addressed the presence of heterogeneity in aetiology, disease trajectory, prognosis and outcomes among patients with SA-AKI [13,14]. Furthermore, diverse pathophysiological mechanisms of SA-AKI are not well-understood yet [15,16]. These features make clinical managements of SA-AKI not optimal or standardized [3,17]. Unsupervised clustering is an agnostic class discovery largely used in basic experiments and genetic studies [18]. This data-driven approach has been successfully applied in clinical studies to identify distinct sub-phenotypes in diseases with high heterogeneity, such as heart failure [19], hypertension [20], diabetes [21], sepsis [22] and chronic kidney disease [23]. Literature suggests unsupervised clustering can identify sub-phenotypes for outcome predictions in patients with SA-AKI [24][25][26]. However, none of these clinical studies further deciphered previously unrecognized markers or mechanisms hidden behind the sub-phenotypes of SA-AKI, while basic research always did so [27,28].
We hypothesized that agnostically clustering patients with dialysis-requiring SA-AKI can identify distinct subpopulations with different clinical relevance. This study was designed to examine phenotype heterogeneity in critically ill patients with dialysis-requiring SA-AKI using consensus clustering of multidimensional clinical data. Understanding potential predictors of clinical outcomes may facilitate appropriate intervention and contribute to early shared decision-making for caring patients with SA-AKI.

Study cohort
The main study cohort, National Taiwan University Hospital Study Group on Acute Renal Failure (NSARF) database contains prospectively collected data from patients with AKI during surgical intensive care unit (ICU) admission [29][30][31][32]. Enrolled patients were admitted to surgical services for perioperative care or other unstable medical conditions due to sepsis. The study was complied with the Helsinki Declaration and approved by the Institutional Review Board of National Taiwan University Hospital (201407076RINA). Written informed consent was waived because there was no breach of privacy or interference with patient care.
In this observational cohort study, we investigated patients with dialysis-requiring SA-AKI, defined by the simultaneous presence of both sepsis and AKI [33]. We first screened patients aged 18 years and above in the database who encountered an episode of dialysis-requiring AKI and had suspected concurrent infection between January 1, 2009 and December 31, 2018. Dialysis-requiring AKI was defined as the need to receive renal replacement therapy (RRT) during the index AKI according to the Kidney Disease: Improving Global Outcomes AKI criteria [34]. Patients who had pre-existing end-stage kidney disease, were kidney transplant recipients or had ever received any RRT were excluded from the study. Infection was suspected as the combination of usage of systemic antibiotics and positive microbiology culture from body fluids. Sepsis was defined according to the third international consensus definition for sepsis and septic shock (Sepsis-3) [35]. To be classified as having sepsis, patients with a suspected or confirmed infection should have met at least two quick Sequential Organ Failure Assessment (SOFA) criteria and have an acute increase in total SOFA score by ≥2 within the 24-h period before RRT was started. The baseline SOFA component was considered zero unless the patient has pre-existing organ dysfunction before the onset of infection [35]. Patients with chronic hepatic dysfunction and chronic respiratory impairment were assigned a baseline SOFA score of 4 and 2, respectively [36].

Clinical assessments and dialysis initiation
All demographic and clinical data were prospectively collected on a standardized form. Baseline serum creatinine was defined as the lowest value measured within 180 days before the index admission, or the lowest value before dialysis during the index admission if the patient had no serum creatinine measurement within the previous 180-day period [30,37]. Baseline estimated glomerular filtration rate (eGFR) was calculated by the four-variable Modification of Diet in Renal Disease equation [38]. The worst physiological and laboratory values during the 24-h period upon initializing RRT were collected. Missing data was imputed using Multivariate Imputation by Chained Equation.
The doses of inotropic therapies were expressed as inotropic equivalent (IE) (μg/kg/min) [39]. The details of the collected parameters are shown in Table 1 and Supplemental Methods. RRT was initiated according to the pre-determined indications detailed in the Supplemental Methods.

Outcomes
This dataset also leveraged an electronic medical record to continuously recorded data for outcome analysis when patients visited our outpatient department after the index hospitalization. The outcomes of interest were all-cause mortality and being free of dialysis upon 90 days after hospital discharge. All patients were followed up since the initialization of RRT until death, 90 days after hospital discharge or 180 days after the first dialysis, whichever came first. We considered death after weaning off dialysis as mortality, rather than being free of dialysis, because biological kidney recovery without survival is not patient-centred [40].

Statistical analysis
Continuous variables were presented as mean ± standard deviation, whereas categorical variables were presented as numbers and percentages, unless otherwise stated. To examine the heterogeneity of the patients, we applied consensus clustering based on 23 continuous parameters, namely age, Charlson Comorbidity Index (CCI), baseline eGFR, and other clinical variables upon initializing RRT (all the variables in Table 1B). Clustering was performed using the Gower distance metric and partitioning around medoid algorithm. The optimal cluster size was determined using consensus matrix heat maps, within-cluster consensus scores and delta area plots of the consensus cumulative distribution function [18].
Characteristics were compared among the groups using one-way analysis of variance, the Kruskal-Wallis test, the two-sample t-test, the Mann-Whitney U-test and the chi-square test, as indicated. We plotted Kaplan-Meier curves and built multivariate Cox regression models to determine associations between patient mortality and cluster memberships and clinical variables. Since death is a competing event for kidney recovery, we performed competing risk analysis. Cluster membership and clinical variables were analysed using multivariate Fine-Gray sub-distribution hazard models for being free of dialysis. The cumulative incident function (CIF) was used to estimate the probability of being free of dialysis while treating death as a competing risk [41]. To indicate the implications of mortality against lactate levels, a generalized additive mode incorporating subject-specific (longitudinal) random effects and adjusted for clinical factors was plotted. Furthermore, stepwise logistic regression modelling was applied to establish clinical models for identifying the cluster memberships. All tests were two-tailed with significance defined by P values of less than 0.05. Statistical analyses were performed using R 4.

Characteristics of the patients
Among the 2488 patients who received RRT and suspected infection during the study period, 909 were excluded because they had ever received acute or maintenance dialysis before the index ICU admission. Of the remaining 1579 patients, 999 who had a quick SOFA score of 2 or higher and had an acute increase in SOFA score of 2 or higher were enrolled in the study (Figure 1). Demographic characteristics were listed in   [17][18][19][20][21][22][23][24][25], and the median SOFA score was 13 (IQR 10-15).

Dialysis-requiring SA-AKI sub-phenotypes
Consensus clustering, with cluster size ranging from 2 to 6, was performed to agnostically identify distinct subpopulations of patients (Supplemental Figure S1A-E). Clustering when the cluster size was more than 3 generated one or two clusters with a relative lower mean consensus value, indicating less stability of the cluster membership (Supplemental Figure S1F). The changes in the area under the consensus cumulative distribution function curve did not conspicuously increase when the cluster size was more than 3 (Supplemental Figure S2) The distribution of most baseline characteristics was significantly different across the three clusters, indicating that sub-phenotypes existed in the cohort (Table 1, Supplemental Figure S3 and S4). Figure 2 shows the standardized difference in the baseline characteristics according to each cluster. Key features of the clusters were depicted by having an absolute standardized difference of ≥0.3. Cluster 1 included individuals with favourable underlying conditions, that is, younger age, lower CCI and higher baseline eGFR. Nevertheless, acute clinical status upon initializing RRT was worst in cluster 1, as observed by their highest likelihood of having shock, lowest Glasgow Coma Scale (GCS) and PaO 2 to fraction of inspired oxygen ratio and higher serum lactate, APACH II and SOFA scores. A higher proportion of them received mechanical ventilation, extracorporeal membrane oxygenation and CRRT as their first RRT. In contrast, cluster 3 comprised individuals with older age, had a higher burden of comorbidities, and lower baseline eGFR. However, the severity of acute illness seems to be lower, suggested by their lower serum lactate, IE and SOFA scores upon initializing RRT. Compared with other clusters, patients in cluster 3 were less likely to receive cardiovascular surgery, but were more likely to initiate RRT due to azotaemia with symptoms and received intermittent haemodialysis as their first RRT. Figure 3 illustrates each patient's baseline eGFR and SOFA score upon initializing RRT. Most patients in cluster 1 (80.4%) and cluster 2 (70.45%) had baseline eGFR of 45 ml/min/1.73m 2 or higher, while 79.28% patients in cluster 3 had baseline eGFR lower than 45 ml/ min/1.73m 2 . There were 84.38%, 51.26%, and 25.5% patients in cluster 1, 2, and 3 had a SOFA score ≥13, respectively.
The crude rate of being free of dialysis 90 days after hospital discharge was different among clusters ( (Figure 4(B)). After adjusting for age, sex, baseline eGFR and CCI in the Fine-Gray sub-distribution hazard model, patients in clusters 1 (adjusted sub-distribution hazard ratio [sHR], 0.55; 95% CI, 0.38-0.8), but not in cluster 2 (adjusted sHR 0.9; 95% CI, 0.64-1.25), were less likely to become free of dialysis than those in cluster 3.
We performed another supplementary clustering by principal component analysis of the same 23 3) for each cluster are marked in red. dM: diabetes mellitus; HTn: hypertension; cAd: coronary artery disease; adv_cHf: advanced heart failure: defined by new York Heart Association functional class iii or iV; charlson: charlson comorbidity index; baseGfR: baseline estimated glomerular filtration rate; cVs: cardiovascular surgery; TPn: total parenteral nutrition; cPR: cardiopulmonary resuscitation; ecMo: extracorporeal membrane oxygenation; ≤day2: first RRT within 2 days of hospital admission; cRRT: continuous renal replacement therapy; sled: sustained low efficiency dialysis; iHd: intermittent haemodialysis; logUo: logarithm of urine volume; Gcs: Glasgow coma scale; PrediaBW: body weight before first dialysis; BT: body temperature; HR: heart rate; MAP: mean arterial pressure; logPfR: logarithm of the Pao2 to fraction of inspired oxygen ratio; BUn: blood urea nitrogen; cre: serum creatinine; na: sodium; K: potassium; WBc: white blood cells; Hb: haemoglobin; Plt: platelet count; Hco3: bicarbonate; ie: inotropic equivalent; APAcH: Acute Physiology and chronic Health evaluation ii score; sofA: sequential organ failure assessment score. parameters using the k-nearest neighbour graph structure and Louvain algorithm [28,42]. This approach identified two clusters and revealed a similar observation: a cluster featured by younger age, lower CCI, higher baseline eGFR but with greater severity of acute illness was associated with an increased risk of death; meanwhile, another cluster featured by older age, higher CCI and lower baseline eGFR, but with less severity of acute illness, was associated with better survival ( Figure 5).

Identification of high-risk sub-phenotype
For clinical application, we grouped cluster 2 and 3 together and determined whether a set of clinical variables listed in Table 1 could identify cluster 1, which has a highest risk for mortality and becoming dialysis-dependent. Using stepwise logistic regression, a model composited of 11 variables was developed to accurately identify high-risk sub-phenotype 1, with a C-static 0.99 (Table 2, Supplemental Figure S5). At the cut-off point with optimal Youden index, this model achieved a sensitivity of 95.74%, specificity of 95.83%, positive predictive value of 97.64%, and negative predictive value of 92.59% in the main study cohort.

Pre-dialysis hyperlactatemia ≥3.3 mmol/L predicts poor outcomes
Through the cluster analysis, we identified a sub-phenotype (cluster 1) with poor clinical outcomes and was characterized by a notably higher serum lactate level upon initializing RRT (Figure 2). This association was also evident in the supplementary clustering analyses (Figure 5(G)). Thus, we examined the association of serum lactate level with clinical outcomes. By applying the generalized additive mode with adjustment for age, sex, baseline eGFR, CCI and mean arterial pressure upon initializing RRT, the estimated probability of death augmented when the serum lactate level was equal to or more than 3.3 mmol/L (Supplemental Figure S6A). Multivariable Cox proportional hazards analysis revealed that serum lactate levels of ≥3.3 mmol/L upon initializing RRT independently predict all-cause mortality (adjusted HR, 1.34; 95% CI, 1.09-1.65) (Supplemental Table S1). After controlling for mortality as a competing risk, patients who had hyperlactatemia of ≥3.3 mmol/L upon initializing RRT were less likely to become free of dialysis (adjusted sHR, 0.69; 95% CI, 0.51-0.95) (Supplemental Table S2). Supplemental Figure S6B and C demonstrate that patients with pre-dialysis hyperlactatemia of ≥3.3 mmol/L had poor survival and lower probability of being free of dialysis.

External cohort
To extrapolate our findings, we analysed a prospective, observational, multi-centre database of the Nationwide Epidemiology and Prognosis of Dialysis-requiring AKI (NEP-AKI-D) study [5,7]. The NEP-AKI-D study was conducted in ICUs of the 30 participating hospitals in Taiwan. In our previous report, this dataset included 23% of surgical patients and 77% of medical patients [43]. We extracted a subset of 898 patients who encountered dialysis-requiring SA-AKI and had available serum lactate data upon initializing RRT between 2014 and 2016 (Table 1, Figure 6(A), and Supplemental Methods). After a median follow-up of 17.5 (IQR,  days, all-cause hospital mortality occurred in 588 (65.48%).
By applying the clinical model derived from the main study cohort (Table 2), 319 (35.53%) patients in the external cohort were classified as high-risk sub-phenotype. The Kaplan-Meier curves showed that high-risk sub-phenotype was associated with lower survival upon hospital discharge (log rank p < 0.001) (Figure 6(B)). After adjusting for age, sex, baseline eGFR and CCI, Cox hazard analysis showed that high-risk sub-phenotype assignment was associated with an increased risk of in-hospital mortality (adjusted HR, 1.48; 95% CI, 1.25-1.74).

Discussion
There is no specific effective therapy for AKI, partly due to its diverse clinical presentations and not well-understood pathophysiology. One personalized approach is the phenotypical-and/or biomarkers-based managements to prevent AKI or to timely treat AKI [24,44,45]. This study presents another approach to inform the planning of managements at the time of RRT initiation, when critical treatment decisions are often being made. We performed consensus clustering in critically ill patients with dialysis-requiring SA-AKI and identified phenotypic clusters. Importantly, the revealed sub-phenotypes showed relevant association with patient mortality and being free of dialysis. Moreover, this data-driven approach led us to discover hyperlactatemia ≥3.3 mmol/L to be an independent outcome predictor.
Recently, clustering algorithms are increasingly used to figure out heterogeneity in AKI based on patients' demographic features, clinical data, laboratory values and/or biomarkers at ICU admission [24][25][26]. We focused on patients suffering with the most severe form of AKI, namely dialysis-requiring SA-AKI. Consistent with previous reports, our results highlighted the clinical utility of using unsupervised cluster discovery to explore clinically meaningful sub-phenotypes. Of note, although patients in cluster 1 were younger, had lower CCI and had higher baseline eGFR, they had worse disease severities upon initializing RRT and poor clinical outcomes. Similarly, Chaudhary and colleagues have previously revealed a sub-phenotype in which patients were the youngest, had lowest proportions of hypertension, congestive heart failure, and diabetes, but had most severe acute illness and highest mortality [26]. This suggests that the impact of acute-illness severity outweighs that of baseline general condition in patients with dialysis-requiring SA-AKI.
Compared with prior studies on non-dialysis-requiring SA-AKI [24][25][26], our study revealed few different findings. First, baseline severity of acute illness (APACH II and SOFA scores, Table 1B) and mortality were much higher in our patients with dialysis-requiring SA-AKI. Even in this extremely vulnerable population, heterogeneity was still present. There was a 'relatively less risky' sub-phenotype 3 and a 'very high-risk' sub-phenotype 1. Our developed model for identifying high-risk sub-phenotype in dialysis-requiring SA-AKI could be applied in clinical care, such as performing bedside risk calculations by APP-or web-based risk calculator. It may be useful to estimate patient prognosis and help clinical decision-making. Additionally, we demonstrated that patients in sub-phenotype 3 had worse baseline eGFR (Table 1A) but had better outcomes compared with those in sub-phenotype 1. It is possible that encountering dialysis-required SA-AKI in patients with higher baseline kidney function represents a more significant loss of functioning nephrons due to more severe injury, compared to those with poor baseline kidney function. This is compatible with clinical observations that outcomes of severe dialysis-requiring AKI versus underlying advance chronic kidney disease needing dialysis in ICU are very different [46,47]. Furthermore, in our study,  sub-phenotype 1 included more patients receiving first RRT within 2 days of hospital admission (Table 1A and Supplemental Figure 3 W) and had worst outcome. Conversely, Wonnacott and colleagues showed that patient with community-acquired AKI had better clinical outcomes than those with hospital-acquired AKI [48]. This discrepancy may be related to different patient profiles between the studies. Besides, there is difficulty in identifying the exact onset of AKI in sepsis [15]. Those who received RRT later than 2 days of hospitalization might have encountered kidney injury earlier in the course of sepsis.
By constellating patients' comorbidity profiles and acute-illness severities upon initializing RRT, the cluster membership provided a simple prediction of outcomes. Our data suggests that sub-phenotypes in ICU patients with dialysis-requiring SA-AKI may respond differently to 'usual care' . Interestingly, Bhatraju and colleagues have demonstrated that a certain biomarker-based sub-phenotype of AKI benefited from vasopressin therapy, while others did not [24]. Similarly to research in AKI, differential treatment response to fluid management strategy between sub-phenotypes of acute respiratory distress syndrome has also been reported [49]. This clustering approach can be used as a practical measure for prognostication, help to tailor therapeutic strategies, and facilitate precise patient recruitment in future clinical trials.
Another advantage of unsupervised clustering is its potential to discover previously unrecognized or  unconfirmed predictive factors [25]. It has been reported that hyperlactatemia is an independent mortality predictor in patients with SA-AKI in the emergency department [50], postoperative AKI requiring RRT [29] and SA-AKI requiring CRRT [9]. However, other studies have shown that the initial lactate level is not independently associated with death in patients with SA-AKI requiring CRRT [10,51] or post-cardiovascular surgery AKI [32]. Moreover, in the literature, serum lactate has never been reported as an independent predictor of kidney recovery after AKI [6,29,52,53]. In this study, consensus clustering demonstrated that serum lactate levels upon initializing RRT were a disparate feature among clusters (Figure 2). Patients with serum lactate levels of ≥3.3 mmol/L upon initializing RRT had not only a higher risk of mortality but also a higher risk of dialysis dependence, independent of blood pressure, the dose of vasoactive agents and disease severity (Supplemental Table S1 and S2). Whether hyperlactatemia of ≥3.3 mmol/L, rather than the traditional threshold of 2 mmol/L, should be considered an indication of dialysis initiation or surrogate of disease severity for SA-AKI requires further clinical investigations.
This study has limitations. First, we did not include circulating and/or urinary biomarkers in unsupervised clustering. Prior studies have shown that the accuracy of identifying sub-phenotypes with distinct underlying pathophysiology would be improved by incorporating biomarkers of inflammation, endothelial dysfunction, renal tubular injury and cell cycle arrest [24,25,52,54]. Although all enrolled patients in this study had clinically apparent SA-AKI, the exact pathophysiological mechanisms on AKI were not adjudicated. Since biomarker-guided clinical managements are getting popular, we advocate future investigations to characterize these biomarkers across SA-AKI sub-phenotypes. Second, we investigated patients with dialysis-requiring SA-AKI in the same ethnical ICU. Therefore, the findings of the study cannot be generalized to less severe non-dialysis-requiring AKI or other populations with differently distributed features. Third, we used the worst clinical values measured during the 24-h period before RRT initiation. This may have led to selection bias and misclassification bias because biochemical and physiological values always change rapidly in critically ill patients. Our data-driven cluster analysis approach directly relied on the input of the data. Using different inclusive variables at different time points may result in discrete sub-phenotypes. Lastly, the observational nature of this study without a pre-specified protocol of intervention cannot conclude any causal relationship. We are unable to explore whether sub-phenotypes in dialysis-requiring SA-AKI response differently to a specific treatment. Whether sub-phenotypes of patients and/or pre-dialysis hyperlactatemia would permit earlier intervention and the mitigation of mortality is unclear and requires further evaluations.

Conclusions
Unsupervised consensus clustering in patients with dialysis-requiring SA-AKI revealed sub-phenotypes with clinical relevance and serves a novel outcome predictor. This data-driven approach also led us to discover that serum lactate levels of 3.3 mmol/L or more upon initializing RRT is an independent prognostic factor. A developed model for identifying high-risk sub-phenotype based on clinical variables upon RRT initiation could be applied in clinical care. Future researches of SA-AKI sub-phenotypes are warranted to characterize biomarkers, to investigate treatment responses, and to validate in other populations. This strategy represents further development toward precision medicine.

Data availability statement
The data that support the findings of this study are available from the corresponding author (Prof. Vin-Cent Wu, e-mail q91421028@ntu.edu.tw) on reasonable request.